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Abstract 

This paper presents a method for the au- 
tomatic extraction of subgrammars to con- 
trol and speeding-up natural language gen- 
eration NLG. The method is based on 
explanation-based learning EBL. The main 
advantage for the proposed new method 
for NLG is that the complexity of the 
grammatical decision making process dur- 
ing NLG can be vastly reduced, because 
the EBL method supports the adaption of 
a NLG system to a particular use of a lan- 
guage. 

1 Introduction 

In recent years, a Machine Learning tech- 
nique known as Explanation-based Learning EBL 



(Mitchell, Keller, and Kedar-Cabelli, 1986; van 



Harmelen and Bundy, 198q T Minton ct al., 1986) has 
successfully been applied to control and speeding-up 



natural language parsing (Rayner, 1988; Samuelsson 



and Rayner, 1991; Neumann, 1994a; samuelsson. 



1994; prinivas and Joshi, 1995| ; Rayner and Carter. 
1996). The core idea of EBL is to transform the 



derivations (or explanations) computed by a prob- 
lem solver (e.g., a parser) to some generalized and 
compact forms, which can be used very efficiently 
for solving similar problems in the future. EBL has 
primarily been used for parsing to automatically spe- 
cialize a given source grammar to a specific domain. 
In that case, EBL is used as a method for adapting a 
general grammar and/or parser to the s ub-language 
defined by a suitable training corpus ( Rayner and 
Carter, 1996] ). 

A specialized grammar can be seen as describ- 
ing a domain-specific set of prototypical construc- 
tions. Therefore, the EBL approach is also very 
interesting for natural language generation (NLG). 
Informally, NLG is the production of a natural 



language text from computer-internal representa- 
tion of information, where NLG can be seen as 
a complex — potentially cascaded — decision making 
process. Commonly, a NLG system is decomposed 
into two major components, viz. the strategic com- 
ponent which decides 'what to say' and the tacti- 
cal component which decides 'how to say' the result 
of the strategic component. The input of the tacti- 
cal component is basically a semantic representation 
computed by the strategic component. Using a lexi- 
con and a grammar, its main task is the computation 
of potentially all possible strings associated with a 
semantic input. Now, in the same sense as EBL is 
used in parsing as a means to control the range of 
possible strings as well as their degree of ambigu- 
ity, it can also be used for the tactical component 
to control the range of possible semantic input and 
their degree of paraphrases. 

In this paper, we present a novel method for the 
automatic extraction of subgrammars for the control 
and speeding-up of natural language generation. Its 
main advantage for NLG is that the complexity of 
the (linguistically oriented) decision making process 
during natural language generation can be vastly re- 
duced, because the EBL method supports adaption 
of a NLG system to a particular language use. The 
core properties of this new method are: 

• prototypical occuring grammatical construc- 
tions can automatically be extracted; 

• generation of these constructions is vastly sped 
up using simple but efficient mechanisms; 

• the new method supports partial matching, in 
the sense that new semantic input need not be 
completely covered by previously trained exam- 
ples; 

• it can easily be integrated with recently de- 
veloped chart-based generators as described in, 



1996) 



e.g., (Neumann, 1994b; Kay, 199c; 3hcmtov. 



The method has been completely implemented 
and tested with a broad-coverage HPSG-based 
grammar for English (see sec. [| f° r more details). 

2 Foundations 



The main focus of this paper is tactical generation; 
i.e., the mapping of structures (usually represent- 
ing semantic information eventually decorated with 
some functional features) to strings using a lexicon 
and a grammar. Thus stated, we view tactical gen- 
eration as the inverse process of parsing. Informally, 
EBL can be considered as an intelligent storage unit 
of example-based generalized parts of the grammat- 
ical search space determined via training by the tac- 
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ucessing ul similai new uipuL is 
iiuple lookup and iiiaLchiug upei- 
ations, which circumvent re-computation of this al- 
ready known search space. 

We concentrate on constraint-based grammar for- 
malism following a sign-based approach consider- 
ing linguistic objects (i.e., words and phrases) as 
utterance-meaning associations (Pollard and Sag, 
1994). Thus viewed, a grammar is a formal state- 



ment of the relation between utterances in a natu- 
ral language and representations of their meanings 
in some logical or other artificial language, where 
s uch represent ations are usually called logical forms 
( ghieber, 1995 ) . The result of the tactical generator 
is a feature structure (or a set of such structures in 
the case of multiple paraphrases) containing among 
others the input logical form, the computed string, 
and a representation of the derivation. 

In our current implementation we are using TDL, 
a typed feature-based language and inference system 
for constraint-based grammars (Krieger and Schafer. 
1994). TDL allows the user to define hierarchically- 



ordered types consisting of type and feature con- 
straints. As shown later, a systematic use of type 
information leads to a very compact representation 
of the extracted data and supports an elegant but 
efficient generalization step. 

We are adapting a "flat" representa t ion of log- 
ical forms a s described in ( Kay, 1996 ; Copestake] 
et al., 1996| ). This is a minimally structured, but 



descriptively adequate means to represent seman- 
tic information, which allows for various types of 
under-/overspecification, facilitates generation and 
the specification of semantic transfer equivalences 



1 In case a reversible grammar is used the parser can 
even be used for processing the training corpus. 



used for machine translation ( Copestake et al., 1996] ; 
IShcmtov, 1996|) .F1 

Informally, a flat representation is obtained by 
the use of extra variables which explicitly repre- 
sent the relationship between the entities of a logical 
form and scope information. In our current system 
we are using the framework called minimal recur- 
sion semantics (MRS) described in (Copestake et 
al., 1996[ ) . Using their typed feature structure nota- 



tion figure [y displays a possible MRS of the string 
"Sandy gives a chair to Kim" (abbreviated where 
convenient). 

The value of the feature LISZT is actually treated 
like a set, i.e., the relative order of the elements is 
immaterial. The feature handel is used to repre- 
sent scope information, and INDEX plays much the 
same role as a lambda variable in conventional rep- 



resen tations (for more details see ( |Copestake et al., 



1996)) 



3 Overview of the method 
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Figure 3: A blueprint of the architecture. 



The above figure displays the overall architecture 
of the EBL learning method. The right-hand part 
of the diagram shows the linguistic competence base 
(LCB) and the left the EBL-based subgrammar pro- 
cessing component (SGP). 

LCB corresponds to the tactical component of a 
general natural language generation system NLG. In 
this paper we assume that the strategic component 
of the NLG has already computed the MRS repre- 
sentation of the information of an underlying com- 
puter program. SGP consists of a training module 
TM, an application module AM, and the subgram- 



2 But note, our approach does not depend on a flat 
representation of logical forms. However, in the case 
of conventional representation form, the mechanisms for 
indexing the trained structures would require more com- 
plex abstract data types (see sec. ^ for more details). 
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Figure 1: The MRS of the string "Sandy gives a chair to Kim" 
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Figure 2: The generalized MRS of the string "Sandy gives a chair to Kim" 



mar, automatically determined by TM and applied 
by AM. 

Briefly, the flow of control is as follows: During 
the training phase of the system, a new logical form 
mrs is given as input to the LCB. After grammatical 
processing, the resulting feature structure fs(mrs) 
(i.e., a feature structure that contains among others 
the input MRS, the computed string and a repre- 
sentation of the derivation tree) is passed to TM. 
TM extracts and generalizes the derivation tree of 
fs(mrs), which we call the template templ(mrs) 
of fs(mrs). tempKmrs) is then stored in a deci- 
sion tree, where indices are computed from the MRS 
found under the root of tempi (mrs). During the ap- 
plication phase, a new semantic input mrs' is used 
for the retrieval of the decision tree. If a candidate 
template can be found and successfully instantiated, 
the resulting feature structure fs(mrs') constitutes 
the generation result of mrs' . 

Thus described, the approach seems to facilitate 
only exact retrieval and matching of a new seman- 
tic input. However, before we describe how partial 
matching is realized, we will demonstrate in more de- 
tail the exact matching strategy using the example 
MRS shown in figure |l|. 

Training phase The training module TM starts 
right after the resulting feature structure fs for the 
input MRS mrs has been computed. In the first 
phase, TM extracts and generalizes the derivation 
tree of fs, called the template of fs. Each node of 
the template contains the rule name used in the cor- 
responding derivation step and a generalization of 



the local MRS. A generalized MRS is the abstrac- 
tion of the liszt value of a MRS where each element 
only contains the (lexical semantic) type and HAN- 
DEL information (the HANDEL information is used 
for directing lexical choice (see below)). 

In our example mrs, figure displays the gener- 
alized MRS mrsg. For convenience, we will use the 
more compact notation: 

{(SandyRel h4), (GiveRel hi), 

(TempOver hi), (Some h9), 

(ChairRel hlO), (To hl2), (KimRel hl4)} 

Using this notation, figure [| (see next page) dis- 
plays the template tempKmrs) obtained from fs. 
Note that it memorizes not only the rule application 
structure of a successful process but also the way the 
grammar mutually relates the compositional parts of 
the input MRS. 

In the next step of the training module TM, the 
generalized MRS mrs g information of the root node 
of tempKmrs) is used for building up an index in 
a decision tree. Remember that the relative order 
of the elements of a MRS is immaterial. For that 
reason, the elements of mrs g are alphabetically or- 
dered, so that we can treat it as a sequence when 
used as a new index in the decision tree. 

The alphabetic ordering has two advantages. 
Firstly, we can store different templates under a 
common prefix, which allows for efficient storage and 
retrieval. Secondly, it allows for a simple efficient 
treatment of MRS as sets during the retrieval phase 
of the application phase. 



SubjhD 

{(SandyRel h4), (GiveRel hi), (TempOver hi), 
(Some h9), (ChairRel hlO), (To hl2), (KimRel hl4)} 



ProperLe HCompNc 

{(SandyRel h4)| ((GiveRel hi), (TempOver hi) 
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Figure 4: The template templ(mrs). Rule names 
are in bold. 



Application phase The application module AM 
basically performs the following steps: 

1. Retrieval: For a new MRS mrs' we first con- 
struct the alphabetically sorted generalized MRS 
mrs g . mrs g is then used as a path description 
for traversing the decision tree. For reasons we 
will explain soon, traversal is directed by type 
subsumption. Traversal is successful if mrs q has 
been completely processed and if the end node 
in the decision tree contains a template. Note 
that because of the alphabetic ordering, the rel- 
ative order of the elements of new input mrs' is 
immaterial. 

2. Expansion: A successfully retrieved template 
tempi is expanded by deterministically applying 
the rules denoted by the non-terminal elements 
from the top downwards in the order specified 
by tempi. In some sense, expansion just re-plays 
the derivation obtained in the past. This will 
result in a grammatically fully expanded fea- 
ture structure, where only lexical specific infor- 
mation is still missing. But note that through 
structure sharing the terminal elements will al- 
ready be constrained by syntactic information.,] 



3 It is possible to perform the expansion step off-line 
as early as the training phase, in which case the applica- 
tion phase can be sped up, however at the price of more 
memory being taken up. 



3. Lexical lookup: From each terminal element of 
the unexpanded template tempi the type and 
handel information is used to select the cor- 
responding element from the input MRS mrs' 
(note that in general the MRS elements of the 
mrs' are much more constrained than their cor- 
responding elements in the generalized MRS 
mrs ). The chosen input MRS element is then 
used for performing lexical lookup, where lexi- 
cal elements are indexed by their relation name. 
In general this will lead to a set of lexical can- 
didates. 

4. Lexical instantiation: In the last step of the ap- 
plication phase, the set of selected lexical el- 
ements is unified with the constraints of the 
terminal elements in the order specified by the 
terminal yield. We also call this step terminal- 
matching. In our current system terminal- 
matching is performed from left to right. Since 
the ordering of the terminal yield is given by the 
template, it is also possible to follow other se- 
lection strategies, e.g., a semantic head-driven 
strategy, which could lead to more efficient 
terminal-matching, because the head element is 
supposed to provide selectional restriction in- 
formation for its dependents. 

A template together with its corresponding index 
describes all sentences of the language that share 
the same derivation and whose MRS are consistent 
with that of the index. Furthermore, the index and 
the MRS of a template together define a normaliza- 
tion for the permutation of the elements of a new 
input MRS. The proposed EBL method guarantees 
soundness because retaining and applying the orig- 
inal derivation in a template enforces the full con- 
straints of the original grammar. 

Achieving more generality So far, the applica- 
tion phase will only be able to re-use templates for 
a semantic input which has the same semantic type 
information. However, it is possible to achieve more 
generality, if we apply a further abstraction step on 
a generalized MRS. This is simply achieved by se- 
lecting a supertype of a MRS element instead of the 
given specialized type. 

The type abstraction step is based on the stan- 
dard assumption that the word-specific lexical se- 
mantic types can be grouped into classes represent- 
ing morpho-syntactic paradigms. These classes de- 
fine the upper bounds for the abstraction process. In 
our current system, these upper bounds are directly 
used as the supertypes to be considered during the 
type abstraction step. More precisely, for each el- 
ement x of a generalized MRS mrs g it is checked 



whether its type T x is subsumed by an upper bound 
T s (we assume disjoint sets). Only if this is the case, 
T s replaces T x in mrs g .n Applying this type abstrac- 
tion strategy on the MRS of figure n], we obtain: 

{(Named h4), (ActUndPrep hi), 

(TempOver hi), (Some h9), 

(RegNom hlO), (To hl2), (Named hl4)} 

SubjhD 

{(Named h4), (ActUndPrep hi), (TempOver hi), 
(Some h9), (RegNom hlO), (To hl2), (Named hl4)| 



ProperLe HCompNc 

{(Named h4)) {(ActUndPrep hi), (TempOver hi) 

(Some h9), (RegNom hlO), (To hl2), (Named hl4)| 
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{(ActUndPrep hi), (TempOverhl), 
(Some h9), (RegNom hi 0)) 



DetN 

{(Tohl2), (Namehl4)) 
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DetN 

{(Someh9), 
(RegNom hlO)) 



DetSgLe 

{(Someh9)| 



IntrNLe 

{ (RegNom h 10)) 



Figure 5: The more generalized derivation tree dt g 
oidt. 

where e.g., Named is the common supertype of 
SandyRel and KimRel, and ActUndPrep is the 
supertype of GiveRel. Figure shows the tem- 
plate templg obtained from fs using the more gen- 
eral MRS information. Note, that the MRS of the 
root node is used for building up an index in the 
decision tree. 

Now, if retrieval of the decision tree is directed 
by type subsumption, the same template can be re- 
trieved and potentially instantiated for a wider range 
of new MRS input, namely for those which are type 
compatible wrt. subsumption relation. Thus, the 
template templg can now be used to generate, e.g., 
the string "Kim gives a table to Peter", as well as 
the string "Noam donates a book to Peter" . 



4 Of course, if a very fine-grained lexical seman- 
tic type hierarchy is defined then a more careful selec- 
tion would be possible to obtained different degrees of 
type abstraction and to achieve a more domain-sensitive 
determination of the subgrammars. However, more 
complex type abstraction strategies are then needed 
which would be able to find appropriate supertypes 
automatically. 



However, it will not be able to generate a sentence 
like "A man gives a book to Kim" , since the retrieval 
phase will already fail. In the next section, we will 
show how to overcome even this kind of restriction. 

4 Partial Matching 

The core idea behind partial matching is that in case 
an exact match of an input MRS fails we want at 
least as many subparts as possible to be instantiated. 
Since the instantiated template of a MRS subpart 
corresponds to a phrasal sign, we also call it a phrasal 
template. For example, assuming that the training 
phase has only to be performed for the example in 
figure |l|, then for the MRS of "A man gives a book to 
Kim" , a partial match would generate the strings "a 
man" and "gives a book to Kim" .[] The instantiated 
phrasal templates are then combined by the tactical 
component to produce larger units (if possible, see 
below) . 

Extended training phase The training module 
is adapted as follows: Starting from a template 
tempi obtained for the training example in the man- 
ner described above, we extract recursively all pos- 
sible subtrees templ s also called phrasal templates. 
Next, each phrasal template is inserted in the deci- 
sion tree in the way described above. 

It is possible to direct the subtree extraction pro- 
cess with the application of filters, which are ap- 
plied to the whole remaining subtree in each recur- 
sive step. By using these filters it is possible to re- 
strict the range of structural properties of candidate 
phrasal templates (e.g., extract only saturated NPs, 
or subtrees having at least two daughters, or sub- 
trees which have no immediate recursive structures). 
These filters serve the same means as the "chunking 
criteria" described in (Rayner and Carter, 199q). 



During the training phase it is recognized for each 
phrasal template templ s whether the decision tree 
already contains a path pointing to a previously ex- 
tracted and already stored phrasal template tempi s , 
such that templ s = tempi s . In that case, tempi s is 
not inserted and the recursion stops at that branch. 

Extended application phase For the applica- 
tion module, only the retrieval operation of the de- 
cision tree need be adapted. 

Remember that the input of the retrieval opera- 
tion is the sorted generalized MRS mrs g of the input 



MRS 



Therefore, 



can be handled like a 



sequence. The task of the retrieval operation in the 



If we would allow for an exhaustive partial match 
(see below) then the strings "a book" and "Kim" would 
additionally be generated. 



case of a partial match is now to potentially find all 
subsequences of mrs g which lead to a template. 

In case of exact matching strategy, the decision 
tree must be visited only once for a new input. In 
the case of partial matching, however, the decision 
tree describes only possible prefixes for a new input. 
Hence, we have to recursively repeat retrieval of the 
decision tree as long as the remaining suffix is not 
empty. In other words, the decision tree is now a 
finite representation of an infinite structure, because 
implicitly, each endpoint of an index bears a pointer 
to the root of the decision tree. 

Assuming that the following template/index pairs 
have been inserted into the decision tree: {ab, ti), 
(abcd,t2), (bcd,ts). Then retrieval using the path 
abed will return all three templates, retrieval using 
aabbed will return template t\ and £3, and abc will 
only return <i|] 

Interleaving with normal processing Our 

EBL method can easily be integrated with normal 
processing, because each instantiated template can 
be used directly as an already found sub-solution. 
In case of an age nda-driven chart generator of the 
kind described in ( Neumann, 1994a ; Kay, 1996 ), an 
instantiated template can be directly added as a 
passive edge to the generator's agenda. If passive 
edges with a wider span are given higher priority 
than those with a smaller span, the tactical gener- 
ator would try to combine the largest derivations 
before smaller ones, i.e., it would prefer those struc- 
tures determined by EBL. 

5 Implementation 

The EBL method just described has been fully im- 
plemented and tested with a broad coverage HPSG- 
based English grammar including more than 2000 
fully specified lexical entries.]] The TDL grammar 
formalism is very powerful, supporting distributed 
disjunction, full negation, as well as full boolean type 
logic. 

In our current system, an efficient chart-based 
bidirectional parser is used for performing the train- 
ing phase. During training, the user can interac- 
tively select which of the parser's readings should 
be considered by the EBL module. In this way the 
user can control which sort of structural ambigui- 
ties should be avoided because they are known to 



It is possible to parameterize our system to per- 
form an exhaustive or a non-exhaustive strategy. In the 
non-exhaustive mode, the longest matching prefixes are 
preferred. 

7 This grammar has been developed at CSLI, Stan- 
ford, and kindly be provided to the author. 



cause misunderstandings. For interleaving the EBL 
application phase with normal processing a first pro- 
totype of a chart generator has been implemented 
using the same grammar as used for parsing. 

First tests has been carried out using a small test 
set of 179 sentences. Currently, a parser is used for 
processing the test set during training. Generation 
of the extracted templates is performed solely by 
the EBL application phase (i.e., we did not consid- 
ered integration of EBL and chart generation). The 
application phase is very efficient. The average pro- 
cessing time for indexing and instantiation of a sen- 
tence level template (determined through parsing) of 
an input MRS is approximately one second.^] Com- 
pared to parsing the corresponding string the factor 
of speed up is between 10 to 20. A closer look to 
the four basic EBL-generation steps: indexing, in- 
stantiation, lexical lookup, and terminal matching 
showed that the latter is the most expensive one (up 
to 70% of computing time). The main reasons are 
that 1.) lexical lookup often returns several lexical 
readings for an MRS element (which introduces lex- 
ical non-determinism) and 2.) the lexical elements 
introduce most of the disjunctive constraints which 
makes unification very complex. Currently, termi- 
nal matching is performed left to right. However, 
we hope to increase the efficiency of this step by us- 
ing head-oriented strategies, since this might help to 
re-solve disjunctive constraints as early as possible. 

6 Discussion 

The only other approach I am aware of which 
also considers EBL for NLG is ( Samuclsson, 199l)£] ; 
[Bamuclsson, 1995b ). However, he focuses on the 
compilation of a logic grammar using LR-compiling 
techniques, where EBL-rclated methods are used to 
optimize the compiled LR tables, in order to avoid 
spurious non-determinisms during normal genera- 
tion. He considers neither the extraction of a spe- 
cialized grammar for supporting controlled language 
generation, nor strong integration with the normal 
generator. 

However, these properties are very important for 
achieving high applicability. Automatic grammar 
extraction is worthwhile because it can be used to 
support the definition of a controlled domain-specific 
language use on the basis of training with a gen- 
eral source grammar. Furthermore, in case exact 
matching is requested only the application module 
is needed for processing the subgrammar. In case 



EBL-based generation of all possible templates of 
an input MRS is less than 2 seconds. The tests have 
been performed using a Sun UltraSparc. 



of normal processing, our EBL method serves as a 
speed-up mechanism for those structures which have 
"actually been used or uttered" . However, complete- 
ness is preserved. 

We view generation systems which are based on 
"canned text" and linguistically-based systems sim- 
ply as two endpoints of a contiguous scale of possible 
system architectures (see also (Dale et al., 1994)). 
Thus viewed, our approach is directed towards the 
automatic creation of application-specific generation 
systems. 

7 Conclusion and Future Directions 

We have presented a method of automatic extrac- 
tion of subgrammars for controlling and speeding up 
natural language generation (NLG). The method is 
based on explanation-based learning (EBL), which 
has already been successfully applied for parsing. 
We showed how the method can be used to train 
a system to a specific use of grammatical and lexical 
usage. 

We already have implemented a similar EBL 
method for parsing, which supports on-line learn- 
ing as well as statistical-based management of ex- 
tracted data. In the future we plan to combine EBL- 
based generation and parsing to one uniform EBL 
approach usable for high-level performance strate- 
gies which are based on a strict interleaving of pars- 
ing a nd generation (cf. (Neumann and van Noord. 
1994 |Neumann, 1994a| )). 
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